Goto

Collaborating Authors

 alphageometry 2


Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

arXiv.org Artificial Intelligence

LLMs have demonstrated strong mathematical reasoning abilities by leveraging reinforcement learning with long chain-of-thought, yet they continue to struggle with theorem proving due to the lack of clear supervision signals when solely using natural language. Dedicated domain-specific languages like Lean provide clear supervision via formal verification of proofs, enabling effective training through reinforcement learning. In this work, we propose \textbf{Seed-Prover}, a lemma-style whole-proof reasoning model. Seed-Prover can iteratively refine its proof based on Lean feedback, proved lemmas, and self-summarization. To solve IMO-level contest problems, we design three test-time inference strategies that enable both deep and broad reasoning. Seed-Prover proves $78.1\%$ of formalized past IMO problems, saturates MiniF2F, and achieves over 50\% on PutnamBench, outperforming the previous state-of-the-art by a large margin. To address the lack of geometry support in Lean, we introduce a geometry reasoning engine \textbf{Seed-Geometry}, which outperforms previous formal geometry engines. We use these two systems to participate in IMO 2025 and fully prove 5 out of 6 problems. This work represents a significant advancement in automated mathematical reasoning, demonstrating the effectiveness of formal verification with long chain-of-thought reasoning.


Google DeepMind takes step closer to cracking top-level maths

The Guardian

Even though computers were made to do maths faster than any human could manage, the top level of formal mathematics remains an exclusively human domain. But a breakthrough by researchers at Google DeepMind has brought AI systems closer than ever to beating the best human mathematicians at their own game. A pair of new systems, called AlphaProof and AlphaGeometry 2, worked together to tackle questions from the International Mathematical Olympiad, a global maths competition for secondary-school students that has been running since 1959. The Olympiad takes the form of six mind-bogglingly hard questions each year, covering fields including algebra, geometry and number theory. The combined efforts of DeepMind's two systems weren't quite in that league.